Added ANTLR parse tree persistent caching for procedures#4547
Added ANTLR parse tree persistent caching for procedures#4547manisha-deshpande wants to merge 13 commits intobabelfish-for-postgresql:BABEL_5_X_DEVfrom
Conversation
0e2dd88 to
751d3e8
Compare
| create_date SYS.DATETIME NOT NULL, | ||
| modify_date SYS.DATETIME NOT NULL, | ||
| definition sys.NTEXT DEFAULT NULL, | ||
| antlr_parse_tree JSONB DEFAULT NULL, -- JSONB serialized ANTLR parse tree for caching |
There was a problem hiding this comment.
babelfish_function_ext regression test fails.
Column can be queried from postgres side, not babelfish side due to unsupported datatype.
Should the column type be TEXT instead? Would that affect storage space?
--- /home/runner/work/babelfish_extensions/babelfish_extensions/test/JDBC/./expected/babelfish_function_ext-vu-cleanup.out 2026-02-06 18:16:42.761144533 +0000
+++ /home/runner/work/babelfish_extensions/babelfish_extensions/test/JDBC/./output/babelfish_function_ext-vu-cleanup.out 2026-02-06 18:42:00.808212640 +0000
@@ -52,7 +52,7 @@
-- babelfish_function_ext entry should have been removed after dropping all these functions/procedure
SELECT * FROM sys.babelfish_function_ext WHERE funcname LIKE 'babel_2877_vu_prepare%';
GO
-~~START~~
-varchar#!#varchar#!#nvarchar#!#text#!#text#!#bigint#!#bigint#!#datetime#!#datetime#!#ntext
-~~END~~
+~~ERROR (Code: 33557097)~~
+
+~~ERROR (Message: data type jsonb is not supported yet)~~
There was a problem hiding this comment.
You can ignore this error for now. We can later add a TDS sender function for JSONB (which just sends it as JSON)
Should the column type be TEXT instead? Would that affect storage space?
Yes, JSONB will allow fast lookups compared to JSON/TEXT which will required deserialization of its own. (Which will become a problem for bigger procedures).
751d3e8 to
6bb6abf
Compare
| * | ||
| * This header provides the interface for serializing and deserializing | ||
| * ANTLR PLtsql parse trees to/from JSONB format. The serialized data is | ||
| * stored in the cross-session cache (babelfish_func_ext) to enable faster |
There was a problem hiding this comment.
it's a catalog rather than a cache - even though the catalog will be cached
|
|
||
| /* Read the value for this key */ | ||
| tok = JsonbIteratorNext(&ctx.it, &v, false); | ||
|
|
robverschoor
left a comment
There was a problem hiding this comment.
addded some comments
6316c43 to
ffb844e
Compare
contrib/babelfishpg_tsql/sql/upgrades/babelfishpg_tsql--5.5.0--5.6.0.sql
Outdated
Show resolved
Hide resolved
contrib/babelfishpg_tsql/sql/upgrades/babelfishpg_tsql--5.5.0--5.6.0.sql
Outdated
Show resolved
Hide resolved
84d3a71 to
f7a3051
Compare
Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
…ble and type Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
…E, TRY Statements Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
…d retrieval Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Add cross-session ANTLR parse tree caching for T-SQL stored procedures. Serialized parse trees and datums are stored in babelfish_function_ext using nodeToString/stringToNode. On procedure execution, cached results are restored to skip ANTLR re-parsing. Cache reads validate the stored bbf_version and modify_date before deserializing, skipping stale entries from different Babelfish versions or procedures modified with the GUC disabled. Changes: - Add antlr_parse_tree_text, antlr_parse_tree_datums, antlr_parse_tree_modify_date, and antlr_parse_tree_bbf_version columns to sys.babelfish_function_ext - Store serialized parse tree and version in pltsql_store_func_default_positions - Restore and validate cached parse tree in new function pltsql_restore_func_parse_result invoked prior to ANTLR parse Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Add bbf_version validation, exec-time cache repopulation, and rename/alter/dependency invalidation logic and tests Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Move PLtsql outfuncs/readfuncs code generation entirely to the extension, eliminating the need for PLtsql-specific headers in the engine's gen_node_support.pl input files. Key changes: - gen_pltsql_node_support.pl now generates pltsql_nodetags.h with extension-owned T_PLtsql_* NodeTag values (offset from 1000 to avoid collision with engine's NodeTag enum). Includes ABI stability check that fails the build if node types are added without updating $last_nodetag/$last_nodetag_no. - Wrapper files pltsql_outfuncs.c and pltsql_readfuncs.c mirror the engine's pattern: #include the generated static functions and switch fragments, expose public pltsql_outNode() and pltsql_parseNodeString() dispatch functions. - pltsql_serialize_macros.h provides WRITE_*/READ_* macros replicated from engine internals (not exposed in any PG header). - pl_handler.c registers outNode_hook and parseNodeString_hook in _PG_init() so the engine's outNode()/parseNodeString() delegate to extension code for PLtsql node types. - pltsql.h includes generated pltsql_nodetags.h for T_PLtsql_* defines. - Makefile updated: compiles wrapper .o files (not gen .o directly), with proper dependency rules for generated files. Files changed: src/pltsql_serialize/gen_pltsql_node_support.pl - nodetags generation + ABI check src/pltsql_serialize/pltsql_outfuncs.c - new wrapper src/pltsql_serialize/pltsql_readfuncs.c - new wrapper src/pltsql_serialize/pltsql_serialize_macros.h - shared macros src/pltsql_serialize/pltsql_node_stubs.c - custom read/write nodes src/pltsql.h - include pltsql_nodetags.h src/pl_handler.c - register hooks Makefile - build rules Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Add sys.enable_routine_parse_cache(TEXT, BOOLEAN) to enable or disable ANTLR parse tree caching for individual functions. Complements the existing global session GUC with per-function granularity. Changes: - New antlr_cache_enabled column in babelfish_function_ext (default false) - Function accepts schema.func, schema.func(argtypes), or func (dbo default) - Returns BOOLEAN confirming the flag that was set - Disabling NULLs out cache columns for immediate invalidation - ALTER PROCEDURE preserves the per-function flag - DROP PROCEDURE removes the flag with the row - allow_system_table_mods guard in upgrade SQL for CI compatibility - Tests covering full signature, simple name, no-schema, custom schema, error cases, ALTER preservation, and DROP cleanup Task: BABEL-6037 Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
ae3ac48 to
dcf6cb9
Compare
Description
BABEL-6037: Cross-session ANTLR parse tree caching for T-SQL stored procedures.
Stored procedures with thousands of lines (e.g., ~1300 lines) take excessive time on first execution in each new session due to redundant ANTLR parsing. The PLtsql function hash table is session-scoped, so every new session re-parses from scratch.
This PR introduces persistent ANTLR parse tree caching. Serialized parse trees are stored in new column in
sys.babelfish_function_extusing PostgreSQL'snodeToString()/stringToNode()framework. On first execution in a new session, cached results are deserialized to skip ANTLR re-parsing, with version and modify-date validation to prevent serving stale data.Issues Resolved
BABEL-6037
Changes
Serialization Infrastructure
gen_pltsql_node_support.plcode generator (modeled after PG'sgen_node_support.pl) producespltsql_nodetags.h,pltsql_outfuncs_defs.c, andpltsql_readfuncs_defs.cfrom annotated header files (pltsql_serializable_1.h,pltsql_serializable_2.hwhich are copies of pltsql.h, pltsql-2.h)T_PLtsql_*NodeTag values offset from 1000 to avoid collision with engine NodeTag enum, with an ABI stability check that fails the build if node types are added without updating the last nodetag constantspltsql_outfuncs.candpltsql_readfuncs.cexposepltsql_outNode()andpltsql_parseNodeString()dispatch functionspltsql_serialize_macros.hreplicates engine-internalWRITE_*/READ_*macrospltsql_node_stubs.cprovides custom read/write handlers for nodes requiring special serialization logic (flexible array members, runtime-only fields, string/int arrays)WRITE_*/READ_*macros locally (not exposed in any PG header) (frompostgresql_modified_for_babelfish/src/backend/nodes/readfuncs.candoutfuncs.c)custom_read_write:PLtsql_expr— skips runtime-only fields (plan, func, expr_simple_*)PLtsql_nsitem— handlesFLEXIBLE_ARRAY_MEMBERforname[]PLtsql_row— handles string/int arrays (fieldnames,varnos) witharray_size(nfields)PLtsql_recfield— skips runtime cache fields (rectupledescid,finfo)pl_handler.cregistersoutNode_hookandparseNodeString_hookin_PG_init()so the engine delegates to extension code for PLtsql node typesCatalog Changes
Five new columns in
sys.babelfish_function_ext:antlr_parse_tree_text TEXT— serialized parse tree (nodeToStringoutput)antlr_parse_tree_datums TEXT— serialized datum arrayantlr_parse_tree_modify_date DATETIME— timestamp for staleness detectionantlr_parse_tree_bbf_version TEXT— Babelfish version at serialization timeantlr_cache_enabled BOOL— per-function cache flag (default false)Upgrade SQL in
babelfishpg_tsql--5.5.0--5.6.0.sqladds these columns withallow_system_table_modsguard for CI compatibility.Cache Lifecycle
babelfish_function_ext, and populates the in-session PLtsql hash tablebabelfish_function_ext, validatesbbf_versionandmodify_date, populates hash tableGUC Configuration
babelfishpg_tsql.enable_routine_parse_cache(session-level,PGC_USERSET, defaultfalse) — global toggle for enabling/disabling cache reads and writessys.enable_routine_parse_cache(func_identifier TEXT, enable_flag BOOLEAN)— per-function granularity; acceptsschema.func,schema.func(argtypes), orfunc(defaults todbo); disabling NULLs out cache columns for immediate invalidationBuild System
.ofiles with dependency rules for generated files fromgen_pltsql_node_support.pl.gitignoreNode Allocation
palloc0()calls replaced withmakeNode()to set properNodeTagvalues required by the serialization frameworkPerformance Results
Test Scenarios Covered
[TBD]
Use case based -
Boundary conditions -
Arbitrary inputs -
Negative test cases -
Minor version upgrade tests -
Major version upgrade tests -
Performance tests -
Tooling impact -
Client tests -
Check List
By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.